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Abstract 

An alternative to the matrix inverse procedure is presented. Given a bit register 
which is arbitrarily large, the matrix inverse to an arbitrarily large matrix can be 
peformed in 0(N 2 ) operations, and to matrix multiplication on a vector in O(N). 
This is in contrast to the usual 0(N 3 ) and 0(N 2 ). A finite size bit register can lead 
to speeds up of an order of magnitude in large matrices such as 500 x 500. The FFT 
can be improved from 0(N In N) to O(N) steps, or even fewer steps in a modified 
butterfly configuration. 



The matrix inverse is the backbone to the modern STAP process; this has to be 
performed every time a set of data is trained in the field of view. The complexity 
is one of the primary origins of the cost to the large computing required to process 
the data. Any improvement in the complexity to performing the matrix inverse is a 
desirable. 

Typically the matrix inverse to Mn x n is performed in 0(N 3 ) operations. There 
are several variants, including the LU and QR variants. The LU diagonalization is 
twice as fast as the (Guass) QR form pQ. The LU factorization requires splitting the 
matrix into the product of upper and lower diagonal matrices M = LU. The inverse 
is performed by inverting the respective components. The QR form requires splitting 
the matrix M into an orthogonal component Q times its projection R. 

There is a simplification of the matrix inverse by grouping the entries of the matrix 
My into larger numbers. For example, the first row of the matrix has elements Aq, i\T 2 , 
. . .. A larger number can be built of these entries by placing the digits into one number 
N1N2 ■ ■ .. For the computing purposes a zero number Nq with as many digits as the 
entries My is required, and the rows are grouped into the number AqAoA^Ao . . .. The 
rows of the matrix are now used as a single number in the diagonalization procedure 
in the LU factorization. 

For example, the zeroing out of the matrice's first column requires using the first 
row; a number ba is used to multiply My so that Mn = —huMn/Mu. This number 
multiplies the entire row of the matrix M±j and is added to the zth row. In doing so, 
a set of zeroes is produced in the first column of the matrix M; the numbers bn are 
placed in the lower diagonal factor matrix of L. The procedure is iterated using the 
diagonal elements Mjj to construct upper diagonal and lower matrices L and U. The 
computational cost of one of the multiplications is iV 2 due to the N elements in the 
row and the multiplications and additions to the N elements in the column. 

In using the larger number the N 2 operations to create a zero column can be 
reduced to N operations. This requires the zero number N to have a sufficient 
number of digits so that 

6(AqAyV 2 iVo . . .) = (6JV 1 )JV (6iV 2 )JVo ... (1) 

(AqAyvyVo . . .) + (M 1 M M 2 M ...) = (#! + M 1 )N {N 2 + M 2 )N ... , (2) 

as one number. The bit register in the processor has to be able to handle these two 
operations, multiplication by a scalar and addition. The N operations in the bit 
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register to treat the multiplication and addition of the original row has been reduced 
to one multiplication and one addition. The numbers b which multiply the larger 
numbers NiN N 2 N . . . are collected into the lower diagonal matrix L. 

A separate matrix is required to discern if the subtraction of a positive number 
to this number is negative or positive. For example, 

(KNoNiNo . . .) - (MiMqM 2 Mq . . .) (3) 

could have negative entries Ni — Mi but the absolute value is used in the composition 
of the number iV — M. The subtraction process doesnt work well in the procedure, 
and the numbers are separated into iVj — Mj independently. 

The processing of using the larger numbers instead of the smaller numbers is 
that typical processes such as the matrix inverse and the FFT can be reduced in 
complexity from 0{N 3 ) and iVlniV to 0{N 2 ) and N. 

STAP Example 

The use of spacetime adaptive processing requires the training of data using a 
covariance matrix. This matrix is canonically symmetric in the acquisition of data, 
satisfying the multiplicative product X = XiXj. The inverse of the covariance matrix 
is unwieldy, being performed in 0(N 3 ) steps, but must be performed in conventional 
STAP processes and signal location. 

The product XiXj represents a probability distribution, with positive entries. An 
alternative matrix satisifies = —Xij with positive entries along the diagonal, is 
far more convenient in the matrix inverse procedure. The inverse of the latter matrix 
can be performed theoretically in 0(N 2 ) steps. The limitation is set by the number 
of bits in the bit register; the 0(N 2 ) arises from an arbitrarily large bit register. 

Consider the LU reduction of the alternative covariance matrix X. The process 
of adding the rows to null the lower left triangular portion of U requires only adding 
the numbers M l /N 1 * (N l N N 2 N . . .) to (MiM M 2 M . . .). For example the second 
row modeled by the single number (MiM M 2 M . . .) has a negative entry for Mi and 
positive entries Mj. The addition of M1/JV1 * Nj to the Mj occurs in one operation 
theoretically due to the size of the individual numbers. The Mi/Ni is stored in the 
lower left triangular matrix L in the process of the LU factorization. The operation 
is repeated to first nullify the left column of U (except the diagonal component), and 
then the process is repeated for the other columns. As there are A^ 2 components in 
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the matrix the LU factorization of X requires iV 2 steps; this is considerably faster 
than N 3 when N is of the order of a thousand or more. 

The question is whether data can be trained with the alternate to the covariance 
matrix. The X contains the same information but with minus signs placed to partially 
antisymmetrize. It appears clear by the conventional use of stap in locating signals, 
and in eliminating noise, that this should be possible. 

Matrix Multiplication 

The same use of the bit register and organizing the rows of the matrix in terms 
of whole numbers can be used to simplify matrix multiplication. Usual multiplication 
of a matrix by a vector requires 0(N 2 ) steps. This can be reduced to 0(N) with a 
reordering of the matrix and vector information. 

Consider all positive entries in the matrix M and all positive entries in the vector 
v. The vector consists of one number (v\VoV2Vq . . .), and the columns of the matrix 
consist of individual numbers Mj = (MiM M 2 M . . .). The multiplication of one 
column by the vector element is accomplished in one step: Vi * Mj, with the element 

of the vector used. This results in ViMj = (viMiM ViM 2 Mo The total matrix 

multiplication is then accomplished by adding the previous multiplications: J2 v iMj. 
The vector resultant from the matrix multiplications are stored in the decomposition 
of J^ViMj. The total operations to matrix multiply is 2N steps, and not 0(N 2 ). This 
reduction can be substantial for numbers iV of the order of a thousand. 

The previous example pertains to the matrix M and vector consisting of positive 
values. Minus signs in the matrix can be incorporated very simply by separating 
the elements Mj = (MiM M 2 M . . .) into the respective positive entries and negative 
entries: M+ and Mj . A vector with positive entries can be used to multiply both 
the ViMj = ViMj~ + ViM~ entries. The addition of the column vectors is achieved by 
adding separately the positive entries and the negative entries: X V{M~j and ViM~ . 
Then the individual entries of the two terms are required to be subtracted. The net 
total number of steps is O(N). 

A simple application of the of the matrix multiplication of a vector is the fast 
fourier transform. The butterfly reduction of the usual multiplication of the vector 
lowers the 0(N 2 ) to 0(N ha. N) steps. Theoretically, by separating the matrix into 
real and complex parts, with the minus signs handled separately, can achieve a theo- 
retical FFT in O(N) steps. This exponentially faster than the butterfly configuration. 

The butterfly configuration can also be analyzed with subtle memory allocation 
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of the data transfer to an approximate In N operations. The data has to be reorderd 
in traversing the butterfly. 

Conclusions 

The theoretical improvements in the matrix inverse from 0(N 3 ) steps to 0(N 2 ) 
steps, and matrix times vector from 0(N 2 ) to O(N) steps has profound impact in 
computational science. Unfortunately, a bit register of a large size is required. In 
conventional computing registers, there is a waste depending on the data size. For 
example, a 256 bit register handling 32 bit data can be optimized by a factor of 8, 
which is still substantial. 

Ideally, the theoretical drop of the matrix inverse and the matrix multiplication 
by a factor of N is suitable for more advanced computing apparatus. The theoretical 
bounds in the optimization are achieved with large bit registers, which can be designed 
in several contexts. 
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